Memory-Based Learning
- North America > United States (0.46)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- Europe > Ireland > Leinster > County Dublin > Dublin (0.04)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.93)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.68)
- Information Technology > Artificial Intelligence > Machine Learning > Memory-Based Learning > Case Based Reasoning (0.40)
Randomized Masked Finetuning: An Efficient Way to Mitigate Memorization of PIIs in LLMs
The current literature on memorization in Natural Language Models, especially Large Language Models (LLMs), poses severe security and privacy risks, as models tend to memorize personally identifying information (PIIs) from training data. We introduce Randomized Masked Fine-Tuning (RMFT), a novel privacy-preserving fine-tuning technique that reduces PII memorization while minimizing performance impact. Using the Enron Email Dataset, we demonstrate that RMFT achieves an 80.81% reduction in Total Extraction Rate and 80.17% reduction in Seen Extraction Rate compared to baseline fine-tuning, outperforming deduplication methods while maintaining only a 5.73% increase in perplexity. We present MaxTER, a Pareto-optimal evaluation framework for assessing privacy-utility tradeoffs, and show the performance of RMFT vs Deduplication by Area Under The Response Curve (AURC) metric.
- North America > United States > California (0.14)
- North America > United States > New Mexico > Bernalillo County > Albuquerque (0.04)
- North America > United States > Massachusetts > Suffolk County > Boston (0.04)
- (2 more...)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Memory-Based Learning > Rote Learning (0.88)
Formal Abductive Latent Explanations for Prototype-Based Networks
Soria, Jules, Chihani, Zakaria, Girard-Satabin, Julien, Grastien, Alban, Xu-Darme, Romain, Cancila, Daniela
Case-based reasoning networks are machine-learning models that make predictions based on similarity between the input and prototypical parts of training samples, called prototypes. Such models are able to explain each decision by pointing to the prototypes that contributed the most to the final outcome. As the explanation is a core part of the prediction, they are often qualified as ``interpretable by design". While promising, we show that such explanations are sometimes misleading, which hampers their usefulness in safety-critical contexts. In particular, several instances may lead to different predictions and yet have the same explanation. Drawing inspiration from the field of formal eXplainable AI (FXAI), we propose Abductive Latent Explanations (ALEs), a formalism to express sufficient conditions on the intermediate (latent) representation of the instance that imply the prediction. Our approach combines the inherent interpretability of case-based reasoning models and the guarantees provided by formal XAI. We propose a solver-free and scalable algorithm for generating ALEs based on three distinct paradigms, compare them, and present the feasibility of our approach on diverse datasets for both standard and fine-grained image classification. The associated code can be found at https://github.com/julsoria/ale
- North America > United States > California > San Diego County > San Diego (0.04)
- Europe > France > Île-de-France (0.04)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Case-Based Reasoning (0.95)
- Information Technology > Artificial Intelligence > Machine Learning > Memory-Based Learning (0.95)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.94)
- Information Technology > Artificial Intelligence > Issues > Social & Ethical Issues (0.66)
Title
In this section, we formalize and substantiate the claims of Theorem 1 . Theorem 1 has three parts, which we address in the following sections. First, in Section A.2, we show that the classifier makes progress during the early-learning phase: over the first We prove this rigorously in Section A.3, which shows that the overall magnitude of the gradient terms Finally, in Section A.4, we prove In terms of and ", the gradient ( 2) reads rL We will use the phrase "with high probability" to denote an event which happens with probability We will prove the claim by induction. We proceed with the induction. We now show that the classifier's accuracy on the mislabeled This proves the first claim.
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.94)
- Information Technology > Artificial Intelligence > Machine Learning > Memory-Based Learning > Rote Learning (0.42)
- Health & Medicine (0.93)
- Education > Educational Setting > Preschool (0.64)
- Information Technology > Artificial Intelligence > Machine Learning > Memory-Based Learning > Rote Learning (0.62)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.49)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.48)
- Information Technology > Artificial Intelligence > Machine Learning > Inductive Learning (0.48)
- Research Report > New Finding (1.00)
- Research Report > Experimental Study (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)
- Information Technology > Artificial Intelligence > Machine Learning > Memory-Based Learning > Rote Learning (0.57)
- North America > United States > Massachusetts > Middlesex County > Cambridge (0.14)
- North America > Canada (0.04)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.95)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.69)
- Information Technology > Artificial Intelligence > Machine Learning > Memory-Based Learning > Rote Learning (0.64)
D ej ` a vu Memorization in Vision-Language Models
Vision-Language Models (VLMs) have emerged as the state-of-the-art representation learning solution, with myriads of downstream applications such as image classification, retrieval and generation. A natural question is whether these models memorize their training data, which also has implications for generalization. We propose a new method for measuring memorization in VLMs, which we call d ej ` a vu memorization . For VLMs trained on image-caption pairs, we show that the model indeed retains information about individual objects in the training images beyond what can be inferred from correlations or the image caption. We evaluate d ej ` a vu memorization at both sample and population level, and show that it is significant for OpenCLIP trained on as many as 50M image-caption pairs. Finally, we show that text randomization considerably mitigates memorization while only moderately impacting the model's downstream task performance.
- Europe > Switzerland > Zürich > Zürich (0.14)
- North America > United States > California (0.04)
- South America > Chile > Santiago Metropolitan Region > Santiago Province > Santiago (0.04)
- (2 more...)
- Information Technology > Artificial Intelligence > Vision (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Memory-Based Learning > Rote Learning (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.93)
MemoryScalingPaperCameraReadyMain
We again notice that larger models memorize training data faster. This section shows how perplexity and memorization on the special batch evolve over training. Figure 14 we see that perplexity continues to increase over training, while memorization flatlines. We show plots for the 1.3B model scale, although all of the experiments in 5 exhibit ( T 1) Figure 16 we analyze the average memory unit length over training for two model sizes. We notice that the larger 2.7B model has an average Exact training time varied depended on model scale and dataset size, but all models were trained for up to 140 hours.
- Asia > Middle East > Jordan (0.04)
- North America > United States > Washington > King County > Seattle (0.04)
- North America > United States > California (0.04)
- (2 more...)
- Information Technology > Security & Privacy (1.00)
- Law (0.93)
- Health & Medicine (0.67)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.82)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)
- Information Technology > Artificial Intelligence > Machine Learning > Memory-Based Learning > Rote Learning (0.50)